METHOD AND APPARATUS FOR ADAPTIVE COMPRESSION OF 

SCANNED DOCUMENTS 

Inventor: Xin Li 

Field of the Invention 

This invention relates to scalable lossy compression of scanned documents. Its 
applications include intelligent document management system, electronic delivery and 
reproduction of documents through the Internet. 

Background of the Invention 

This invention deals with the problem of lossy coding of scanned documents, i.e., 
scanned documents are compressed to a significantly reduced bit rate at the price of a degradation 
in quality. Current popular approaches, such as that described by L. Bottou et al, High quality 
document image compression using DjVu, Journal of Electronic Imaging, Vol.7, pp.410-425, 
July 1998, and R. L. de Queiroz et al, Optimizing block-thresholding segmentation for 
multiplexer compressing of compound images, IEEE Trans, on Image Processing, Vol.9, 
pp. 1461-1471, Sep. 2000, both belong to the so-called Mixed Raster Content (MRC)-based 
approaches. They decompose a document into background, foreground and mask layers, and 
then compress each layer separately. The principle weakness with such layer-based approach is 
its intrinsic redundancy. Meantime, the rate control and the scalability feature are not efficiently 
handled in layer-based approaches. 

U.S. Patent No. 5,778,092, to MacLeod etal. granted July 7, 1998, for Method 
and apparatus for compressing color or gray scale documents, describes a technique for 
compressing a color or gray scale pixel map representing a document. 
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A. Said and A. Drukarev, Simplified segmentation for compound image 
compression, Proceeding of ICIP 1999, discusses the relative advantages of object-based, layer- 
based and block-based segmentation schemes. 

M. J. Weinberger, The LOCO-I lossless image compression algorithm: principles 
and standardization into JPEG-LS, IEEE Trans, on Image Processing, Vol.9, No. 8, 
pp. 1309-1324, August 2000, describes the LOw Complexity LOssless Compression for Images 
(LOCO-I) compression algorithm. 

Summary of the Invention 
A method of compressing a document, includes preparing an encoded 
representation of a document by scanning the document to provide a scanner output; classifying 
the scanner output as belonging to a class of document taken from the document classes 
consisting of smooth, text, graphics and image; and adaptively compressing the scanner output as 
a function of the class of the document. 

A compression apparatus for compressing scanned data, includes a scanner for 
scanning a document and generating a scanner output; a block-based classifier for classifying the 
scanner output as belonging to a class of documents taken from the document classes consisting 
of smooth, text, graphics and image; an adaptive compressor for compressing the scanner output 
according to a compression mode as a function of the class of document; a storage mechanism 
for storing compressed scanner output and compression mode information; and a decompressor 
for decompressing compressed scanner output in accordance with the compression mode 
information. 

An object of the invention is to provide an efficient engine and method for 
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compressing the data file of a scanned document. 

Another object of the invention is to provide a compression technique which is a 
function of the type of subject matter contained in the scanned document. 

This summary and objectives of the invention are provided to enable quick 
comprehension of the nature of the invention. A more thorough understanding of the invention 
may be obtained by reference to the following detailed description of the preferred embodiment 
of the invention in connection with the drawings. 

Brief Description of the Drawings 
Fig. 1. The histogram of blocks belonging to different document classes: Fig. 1(a) 
- smooth; Fig. 1(b) - text; Fig. 1(c) - graphic; and Fig. 1(d) - image. 

Fig. 2 depicts an original document to be scanned and compressed. 
Fig. 3 depicts the classification map for the document of Fig. 2. 
Fig. 4 depicts a comparison of an original text document (Fig. 4(a)), and decoded 
prints of the text after processing by the BCAC of the invention (Fig. 4(b)), by DjVu (Fig. 4(c)), 
and by JPEG2000 (Fig. 4(d)). JPEG 2000 may be found at http://www.jpeg.org/JPEG2000.htm. 

Fig. 5 depicts a comparison of an original graphic document (Fig. 5(a)), and 
decoded prints of the text after processing by the BCAC of the invention (Fig. 5(b)), by DjVu 
(Fig. 5(c)), and by JPEG2000 (Fig. 5(d)). 

Fig. 6 depicts a comparison of an original image document (Fig. 6(a)), and 
decoded prints of the text after processing by the BCAC of the invention (Fig. 6(b)), by DjVu 
(Fig. 6(c)), and by JPEG2000 (Fig. 6(d)). 

Fig. 7(a) depicts an original text region, and the decoded text region before 
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post-processing (Fig. 7(b)), and after post-processing (Fig. 7(C)) by the BCAC of the invention. 
Fig. 8 is a block diagram of the apparatus of the invention. 

Detailed Description of the Preferred Embodiments 
The Block-based Classification and Adaptive Compression (BCAC) coder of the 
invention provides a novel solution to the lossy compression of scanned documents. It structures 
the image into non-overlapping blocks and does the classification on a block-by-block basis. 
Depending on the classification results, the block is adaptively compressed using one of four 
different standard compression methods: singular mode, binary mode, M-ary mode and 
continuous mode. All of the generated symbols are coded by an adaptive binary arithmetic coder 
(QM-coder). The overall complexity is kept comparable to that of JPEG-2000. The BCAC 
coder and method of the invention include two stages: a block-based classification stage and an 
adaptive compression or coding stage, which are separately detailed as follows: 
Block-based classification 

The classification is based on empirical statistics collected from the data within 
the block. Fig. 1 depicts several examples of typical histograms, showing pixel value on the x- 
axis and frequency count on the y-axis, of the blocks belonging to different classes: Fig. 1(a) - 
smooth class, e.g., having only one dominant value; Fig. 1(b) - text class, e.g., having two 
dominant values; Fig. 1(c) - graphic class, e.g. having more than two dominant values; and Fig. 
1(d) - image class, e.g. having no dominant values. Depending on whether there is only one or 
no dominant value in the distribution, the smooth blocks (Fig. 1(a)) and the image blocks (Fig. 
1(d)) may be identified. The distinction between a text block (Fig. 1(b)) and a graphics block 
(Fig. 1(c)) is subtle because the anti-aliasing effect found around the text regions may introduce a 
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false mode detection. One way to overcome such difficulty is to exploit the contrast information. 
A block is classified as a text block only if its contrast is larger than a preselected contrast 
threshold, Th. Meanwhile, to facilitate the following coding stage, the classification method of 
the invention enforces the following order of priority to the classification: smooth > text > 
graphics > image. Such selection of ordering priority is the result of observations that 
compressing a block in a dominant value with a higher priority than warranted seriously affects 
the visual quality of the block upon decompression. For example, when a graphics block is 
coded in binary mode, one or more colors may disappear. 

Based on the above discussion, a sequential classification scheme is used. The 
classification is summarized into the following sequential three steps: 

1) The histogram function, f(c), and the max/min values within a BxB block are obtained. If 
the difference between max and min is below a preselected threshold, e.g., 8, the block is 
classified as a smooth block. A pixel value c is said to be dominant if f(c) > f(c+l) and 
f(c) > f(c-l) and f(c) > th x = 0.1, where th x is the threshold for determining a dominant 
value. 

2) Otherwise, the first two dominant values c l5 c 2 are found and the cumulative probability, p 
(summation over [c r A, q+A] and [c 2 -A, c 2 +A]) are calculated. If Iq-cd > 128 and p > 
Th, the block is classified as a text block. 

3) Otherwise, all valid dominant values are found and the cumulative probability is 
calculated. If the number of dominant values n belongs to [1,M] and the cumulative 
probability p (summation over all [Ck-A, Ck+A] for k=l,...,n) > th 2 , where th 2 is the 
threshold used to distinguish graphic block from image blocks, the block is classified as a 

5 SLA.0389 



graphics block. Otherwise, the block is classified as an image block. . 
Fig. 2 depicts the original test image used to acquire experimental data. The classification 
results, B=32, A=M=4, and th x = 0.10 and th 2 = 0.75, are seen to be reasonably satisfactory. The 
classification result for each block is explicitly transmitted to the decoder. Therefore the 
classification is only performed at the encoder. Such asymmetric structure is desired in many 
real applications, e.g., browsing scanned documents over the Internet using a rather simple 
docoder. 

Adaptive Compression and Coding 

Based on the classification result, the block is compressed in one of the following 
four modes: singular mode, binary mode, M-ary mode and continuous mode. 
Singular mode. The compression of a smooth block is simple. Only the mean value needs to be 
transmitted to the decoder. Spatial scalability is also straightforward. 

Binary mode. The block is first quantized into a binary map and then a progressive JBIG-like 
coder is used to compress the binary map, JBIG, Progressive bi-level image compression, 
International Standard, ISO/IEC 1 1544, 1993. The values of Ci,c 2 are transmitted to the decoder 
as the side information. It should be noted that some post-processing technique, e.g., the 
low-pass filter, described in Table 1, may be used to simulate the anti-aliasing effect at the 
decoder. A simulation example is provided later herein to demonstrate the visual effect of low- 
pass filtering on decoded text blocks. 
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5 Table 1: Low-pass filter simulating the anti-aliasing effect. 

Though the low-pass filter intentionally "blurs" the sharp edge around the text regions, the mean- 
square error (MSE) value compared to the original image is significantly reduced. The text 
blocks are also visually closer to the original ones after the postprocessing. 
M-ary mode. The coding in this mode is similar to the compression of a palette-based image. 
OO Since only a small palette (M=4) is allowed, a direct context-based entropy coding scheme is 
5 suitable. If the nearest two causal neighbors are considered, there are 4 2 =16 different contexts in 
[; total. Spatial scalability is a more challenging problem in this mode. Classical linear transforms, 
iVf such as wavelet transforms, fail to preserve level-set and thus do not lead to efficient coding of 
O palette-based images. The approach of extending the famous lifting scheme of W. Sweldens, The 
! 1 5 Lifting Scheme: A new philosophy in biorthogonal wavelet constructions, Wavelet Applications 
R in Signal and Image Processing HI, pp. 68-79, Proc. SPIE 2569, 1995, is used to obtain a 
level-set preserving multi-resolution decomposition of the palette image. For simplicity, the 
low-resolution image is obtained directly from the downsampling of the high-resolution image, 
i.e., s(i,j) = x(2i,2j). The image s(i,j) is used to predict the other three quarters of the image x(ij): 

i(2z,2y + 1) = s(y), 
4(2z + 1,2t) = s(ij\ 
x(2i + 1,2; + 1) = P[*(2z,2;), x(2i + x(2i,2j + 1)] 

20 
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where P[.] is a modified median edge detection predictor, directly using x(2i,2j) to predict 
x(2i+l, 2j+l) when no horizontal or vertical edge is detected. The prediction residue is 
generated by e=x - x{mod M) and its reversibility is achieved by e=x + x{mod M). Empirical 
studies show that the overall bit rate increases by about 10% to 30% with the multi-resolution 
5 constraint 

Continuous mode. Scalable compression of the image block has been extensively studied in 
recent years. Wavelet-based coders have demonstrated the very best compression performance 
while offering flexible scalability features. Here, the normalized S+P wavelet transform, 
described by A. Said and W. A. Pearlman, An image multiresolution representation for lossless 
;:|o and lossy image compression, IEEE Trans, on Image Processing, vol. 5, pp. 1303-1310, Sept. 
% 1993, is used for its computational efficiency. Because the transform works on a block-by-block 
III basis, a symmetric extension technique is used at the block boundaries to alleviate potential 

block artifacts. Wavelet coefficients are scanned and coded in a bitplane-by-bitplane order. A 
W two-stage coding technique, similar to the LZC coder proposed by Taubman et ah, Multirate 3D 
[| 5 subband coding of video, IEEE Trans, on Image Processing, Vol. 3, No. 5, pp. 572-588, Sep. 
r= 1994, is employed. At the first stage (zero coding), the positions of significant coefficients are 
first transmitted by a JBIG-like coder; at the second stage (refinement coding), the magnitude of 
significant coefficients are coded after a binary expansion. In order to keep the overall 
computational complexity low, no rate-distortion optimization technique is used. 
20 Results 

Using the flower image of Fig. 2, with the size of 1728, or, as used in the actual 
experiment, 2016, because it contains abundant text/graphic blocks as well as image blocks, the 
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BCAC coder of the invention is compared to the popular DjVu coder and the JPEG2000 VM8 
coder. Though the JPEG2000 standard is not developed for compressing compound images, it 
may be used as a reference in the comparison. In the BCAC coder, all symbols are coded by an 
adaptive binary arithmetic coder. The overall computational complexity of the BCAC coder 
appears to be acceptable. For example, it takes around 5 seconds for JPEG2000 or BCAC coder 
to compress the flower image on a Pentium-Ill 866M machine, while Dj Vu requires in excess of 
10 seconds. Figs. 4-6 depicts the original and decoded text, graphic and image regions taken from 
the decoded image by three the different coders. The actual bytes used by BCAC, DjVu and 
JPEG2000 are 129234, 138312 and 130622, respectively, which correspond to the bit rate of 
about 0.3bpp. The PSNR results achieved by BCAC, DjVu and JPEG2000 are 27.8dB, 21 .OdB 
and 3 1 .4dB, respectively. Though JPEG2000 achieves the highest PSNR result, its subjective 
quality is not the best. Indeed PSNR values do not faithfully reflect the visual quality of a 
compound image especially for the text and graphics blocks. It is easy to observe that the BCAC 
coder achieves much better performance than DjVu and JPEG2000 coders in terms of subjective 
quality. Because the quality of text/graphics blocks are preserved by a handful of bits, more bits 
may be spent to code the image blocks and achieve better visual quality. In Fig. 7, the text 
blocks before and after post-processing are compared. The block after the low-pass filtering is 
seen to more accurately represents the original block. The PSNR improvement is about 2.3dB. 

The BCAC coder apparatus of the invention is depicted in Fig. 8, generally at 10. 
A scanner 12 provides a scanner output in the form of a file containing the digital data generated 
from the document in question. A block-based classifier 14 includes histogram generator 16 and 
a threshold selection mechanism 18. The threshold selection mechanism is most likely a manual 
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input device, wherein a uses sets the various contrast threshold values. Classifier 14 provides an 
output which includes the scanner output and a flag to identify the class of document associated 
with the scanner output. 

An adaptive compressor, or coder 20, applies the proper compression mode to the 
scanner output, which mode is associated with the scanner output. The scanner output and the 
compression mode may be stored in a storage device 22. The compressed scanner out put and 
the mode information is directed to a decompressor/decoder for processing to "revive" the 
document. 

Though the coder of the method of the invention is designed for compressing 
scanned documents that contain significant noise, it is also applicable to the lossy compression of 
computer-generated documents that have little noise. Meanwhile, it is easy to generalize this 
scheme to compress color documents. Spatial scalability is an attractive feature provided by this 
approach. Reproduction of the scanned documents at various resolutions is useful in many 
important applications, e.g., intelligent document management systems can render scanned 
documents at the resolutions specified by the user. 

Thus, a method and apparatus for compressing scanned documents has been 
disclosed. It will be appreciated that further variations and modifications thereof may be made 
within the scope of the invention as defined in the appended claims. 
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